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1  Productivity  Measures 

•  Papers  submitted  but  not  yet  accepted:  6 

•  Refereed  papers  accepted  and  in  press;  17 

•  Refereed  papers  published:  15 

•  Books  submitted  or  published;  0 

•  Book  chapters  or  other  articles;  3 

•  Ph.D.  dissertations:  3 

•  Patents  tiled  or  granted;  0 

•  Invited  presentations':  15 

•  Contributed  presentations;  5 

•  Honors,  Prizes,  Awards  and  Professional  Activities: 

•  John  Lehoezky: 

•  Associate  Editor,  Journal  of  Real-Time  Systems, 


itifnrp- 


•  Member  of  the  program  committee  of  the  14ih  IEEE  Real-Time  Systems 
Symposium,  the  1993  ICDCS  and  the  second  Rate  Monoionic  Users  Conference. 


•  Member.  NIH  Special  Study  Section  on  Statistics  (Chair, .  dy,  1993) 

*  Lui  Sha 

•  Member  NASA  Space  Station  Advisory  Committee, 

•  Chairman  of  the  Board  of  Visitors  of  RICIS,  an  R&D  center  established  by 
NASA  and  NASA  JSC  at  University  of  Houston  at  Qearlakc. 

•  General  chair,  13th  IEEE  Real-Time  Systems  Symposium, 

•  Associate  Editor,  Real-Time  Systems 


•  Associate  Editor.  IEEE  Computer 

•  The  paper:  "Distributed  System  Design  Using  Generalized  Rate  Monotonic 
Theory,"  by  U.  Sha  and  S.  Sathaye  published  in  Proceedings  of  The  2nd 
International  Conference  on  Autorruttion.  Robotics,  and  Computer  Vision,  1992. 
was  selected  as  one  of  the  most  innovative  papers.  An  updated  version  will  be 
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published  again  in  a  Special  Issue  of  the  Journal  of  Integrated  Computer-Aid 
Engineering  in  1993. 

•  Jay  Strosnider  * 

•  Promoted  to  associate  professor  of  electrical  and  computer  engineering,  July, 

1993. 

•  Member  of  the  Program  Comminee.  13th  IEEE  Real-Time  Systems  Symposium 

•  Program  chair.  Workshop  on  Real-Time  Multimedia  Systems,  December,  1993.  • 

*  Hide  Tokuda 

•  Program  Conunioee  Member; 

•  IEEE  1 1th  IEEE  Workshop  on  Real-Time  Operating  Systems  and 
Software 

•  International  Symposium  on  Object  Technologies  for  Advanced  Software 
(ISOTAS  ’93) 

•  WTSS'93  (Workshop  on  Interactive  Systems  and  Software),  JSSST 

•  3rd  International  Workshop  On  Responsive  Computer  Systems 

•  IEEE  1 2th  IEEE  Workshop  on  Real-Time  Operating  Systems  and  * 

Software 

•  6th  EUROMICRO  Workshop  on  Real  Time  Systems 

•  Graduate  students  supported:  2 

•  Post-docs  supported;  0  • 

•  Minorities  supported;  0 
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2  Summary  of  Technical  Progress 

• 

2.1  Overview 

The  .^RT  (Advanced  Real-Time  Technology)  Project  of  Carnegie  Mellon  University  is  engaged  in 
.*.ide  ranging  research  on  hard  real-time  systems.  The  projea  has  as  its  overall  goal  the  development  and 
dcmorLSiration  of  predictable  and  fault  tolerant  hard  real-time  computer  systems.  To  achieve  this  goal, 
research  is  being  conducted  in  three  interrelated  areas:  ^ 

1 .  The  development  of  a  theory  of  hard  real-time  resource  management  which  includes 
processors,  operating  systems  and  communications  which  will  permit  the  straightforward 
integration  of  predictable  systems  using  open  system  standards. 

2.  The  design  and  construction  of  operating  systems  that  support  the  theory  of  hard  real-time  ^ 

resource  management 

3.  The  design  of  fault  tolerance  techniques  including  hardware  and  software  fault  tolerance 
using  temporal  redundancy  and  analytic  redundancy  to  permit  the  construction  of  real-time 
systems  whose  performance  and  dependability  ate  predictable. 

The  ART  Project  is  supported,  in  part  by  three  distina  ONR  Contracts  (N(X)014-93-J  1771,  • 

N00014-92-J-1524  and  .NOOO 14-9 1-J- 1304).  Ln  this  report,  we  describe  progress  for  the  principle 
invesugaiors  supported  by  these  three  contracts. 

During  the  October  1,  1992  -  September  30,  1993  period,  substantial  progress  was  made  in  each  of 
these  broad  categories.  Only  the  progress  on  real-time  resource  management  and  temporal  redundancy  for  B 

fault  tolerance  is  briefly  described  below.  A  more  detailed  collection  of  briefing  materials  for  the  entire 
project  is  contained  in  the  yearly  ART  Project  Review  provided  to  ONR  representatives. 

In  July  1993,  NGCR  has  asked  us  to  1)  evaluate  the  real-time  extention  to  IEEE  Scalable  Coherent 
Interface,  which  is  an  advanced  computer  plane  that  can  support  multiple  topology  using  fiber  optic 
connections.  2)  to  lead  the  technical  effort  for  Navy’s  next  generation  high  performance  network.  The 
theoretical  work  developed  by  ART  Project  researchers  will  serve  as  a  foundation  for  these  efforts. 


2.2  Integrating  Scheduling  and  Fault  Tolerance 

Over  the  last  year,  substantial  progress  was  made  on  the  integration  of  real-time  scheduling  with  fault  % 

tolerance  to  create  a  theory  of  temporal  redundatKy.  Temporal  redundancy  is  an  approach  to  real-time 
system  dependability  in  which  subtasks  which  are  a  part  of  tasks  with  real-time  requirements  but  which 
fail  their  acceptance  test  can  be  executed  again,  the  latter  execution  being  scheduled  so  that,  if  successful. 


» 
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ihe  task  will  meet  its  timing  requirements,  and  the  reexecution  does  not  cause  any  other  task  to  miss  its 

deadline.  The  failures  occur  randomly,  thus  they  create,  in  effect,  a  stream  of  aperiodic  job  requests.  The 

job  requests  correspond  to  the  time  required  to  retry  the  subtask,  or  an  alternative  version  of  the  subtask,  * 

and  they  have  a  deadline  which  is  the  same  as  the  deadline  of  the  failed  task.  The  temporal  redundancy 

problem,  therefore,  can  be  considered  to  be  a  special  version  of  the  problem  of  jointly  scheduling  hard 

deadline  periodic  tasks  and  aperiodic  tasks.  However,  in  this  case,  the  aperiodic  tasks  also  have  hard 

deadlines,  a  problem  which  has  never  been  addressed  in  the  rate  monotonic  environment.  The  goal  of  the 

research  is  to  develop  methods  to  solve  this  new  joint  scheduling  problem  and  to  assess  the  efficacy  of  the  • 

algorithms  produced  in  enhancing  real-time  system  fault  tolerance.  The  joint  scheduling  problem  with 

hard  deadline  aperiodic  tasks  was  solved  in  the  recent  paper  by  Ramos-Thuei  and  Lehoezky  to  appear  in 

the  1993  Real-Time  Systems  Symposium.  The  use  of  these  methods  to  provide  the  largest  possible 

temporal  redundancy  was  studied  in  the  Ph.D.  dissertation  of  Ramos-Thuei. 

Temporal  rcdundarKy  in  rcal-ume  systems  requires  that  time  be  allocated  to  aperiodic  tasks  in  such  a  * 

way  that  they  can  meet  their  timing  requirements  without  causing  any  non-failed  task  to  miss  its  deadline 
There  are  two  static  allocauon  algorithms  for  fixed-priority  systems  that  have  been  proposed;  the  Private 
Reservation  Algorithm  (PRA)  which  reserves  time  which  is  bound  to  the  recovery  of  individual  tasks  and 
the  Communal  Reservation  Algorithm  (CRA)  which  reserves  a  pool  of  time  available  to  recovery 
operations  on  a  first-come  first-serve  basis.  The  PRA  permits  certain  tasks  to  have  guaranteed  recovery  * 

properties;  however,  the  absence  of  resource  sharing  makes  this  algorithm  inefficient  in  the  sense  that 
some  tasks  receive  no  additional  coverage.  The  OfA  provides  resource  sharing,  but  the  pool  of  available 
time  is  created  under  worst  case  conditions.  Consequently,  while  this  ensures  that  no  other  tasks  will 
miss  their  deadlines,  the  conservative  calculations  create  situations  when  sufficient  ume  is  available  for 
recovery,  but  the  time  provided  by  the  CRA  is  inadequate.  • 

The  approach  to  improve  upon  the  conservative  CRA  is  to  use  the  slack  stealing  aleoruhm  This 
algorithm  makes  detailed  calculations  of  the  slack  that  is  available  dunng  any  interval  oi  tin  c  using  the 
exact  schcdulability  equations  associated  with  fixed  priority  .scheduling  algonthms  When  any  apcnodic 
task  is  ready  for  execution,  an  exact  calculation  is  made  to  see  if  there  is  sufficient  time  available  to  ^ 

execute  that  task  without  missing  any  other  deadlines.  The  performance  of  the  slack  stealing  algorithm  is 
different  for  hard  deadline  apenodics  than  it  is  for  soft  deadline  apcriodics.  In  the  former  case,  there  is  no 
strongly  optimal  scheduling  algorithm.  Because  the  scheduling  problem  is  on-line  and  the  apenodics 
have  hard  deadlines,  a  decision  to  accept  one  apenodic  task  for  processing  may  entail  rejecting  another 
task.  A  different  algorithm  may  not  be  able  to  accommodate  the  first  task  but  is,  therefore,  unable  to  ^ 

accommodate  the  second.  This  makes  the  pcrformaiKe  of  these  two  algorithms  incomparable.  A  second 
difference  is  that  there  is  no  optimal  priority  level  at  which  to  process  the  apenodic  tasks,  whereas  for  soft 
deadline  aperiodics,  it  is  optimal  to  execute  them  at  the  highest  priority  level.  This  choice  creates 
additional  variability  into  the  algonthmic  structure.  .Nevertheless,  the  slack  stealing  algorithm  is  far 
superior  to  the  PRA  and  CRA  algorithms. 

• 

Unfortunately,  in  some  cases  the  slack  stealing  method  introduces  a  large  memory  and  scheduling 
overhead.  A  direct  extension  of  slack  stealing  for  the  hard  aperiodic  scheduling  case  yields  a  worst-ca.se 
scheduling  overhead  of  n^,  where  n  is  the  number  of  periodic  tasks. 

To  reduce  the  implementation  overhead  of  the  slack  stealing  method,  an  algorithm  called  the  Myopic  ^ 

Slack  Management  fMSM)  algorithm  was  introduced.  Although  the  MSM  algorithm  is  also  based  on  the 
concept  of  slack  .stealing,  the  memory  overhead  is  reduced  by  using  coaservative  estimates  of  the  slack 
available  for  each  periodic  task  at  run-time.  To  make  slack  estimation  computationally  feasible,  the 
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accumulation  of  available  slack  is  restricted  to  relatively  short  intervals  of  time.  As  a  result,  it  is  said  that 

the  MSM  algorithm  is  nearsighted,  or  myopic,  in  its  ability  to  accumulate  slack,  but  on  the  other  hand,  it 

is  not  as  conservative  as  the  CRA.  The  scheduling  overhead  is  reduced  by  restricting  the  service  of  hard  • 

apenodic  tasks  to  a  maximum  of  two  pnonty  levels;  that  of  the  failed  periodic  task  issuing  the  recovery 

request,  and  the  deadline  monotonic  pnority  level  for  the  aperiodic.  These  techniques  reduced  the 

memory  and  scheduling  overheads  to  a  worst -case  complexity  of  n. 

The  performance  of  the  MSM  algorithm  may  be  lower  than  that  expected  from  a  direct  implementation  l 

of  the  slack  stealing  approach  because  of  its  tendency  to  underestimate  the  slack  available  and  the 
limitations  imposed  on  the  priority  levels  considered  for  service.  However,  the  MSM  trades  off  some  of 
the  performance  of  the  slack  stealing  method  for  a  scheduling  solution  which  has  significantly  less 
overhead.  A  quantitative  comparison  of  the  performance  of  the  static  and  dynamic  allocation  strategies 
was  performed.  Specifically,  the  comparison  studies  iiKluded  the  static  Private  and  Communal  ^ 

Reservation  Algorithms  and  the  dynamic  Myopic  Slack  Management  algorithm. 

To  measure  the  effectiveness  of  an  allocation  algorithm,  a  metric  referred  to  as  recovery  coverage  was 
introduced.  Recovery  coverage  parallels  the  well-known  concept  of  error  detection  coverage.  It  was 
empirically  computed  as  the  percentage  of  recovery  requests  accepted  for  service  relative  to  the  total 
number  of  recovery  requests  issued  during  a  simulation.  Under  no  conditions  was  the  service  of  a  • 

recovery  request  allowed  to  jeopardize  the  timing  correctness  of  any  fault- free  application  task. 

.Analytical  models  for  predicting  the  coverage  provided  by  the  PRA  and  the  CRA  were  derived.  The 

predicted  coverage  for  the  PRA  was  shown  to  match  the  empirical  results  very  closely.  The  prediction 

model  for  the  CRA  was  shown  to  be  optimistic  but  it  offered  insights  in  explaining  the  performance 

behavior  of  this  algorithm.  All  simulation  results  obtained  for  the  application  workloads  coasidered  were  I 

consistent  The  MSM  algorithm  proved  to  be  very  robust  to  changes  in  periodic  loading  conditions  and  to 

increases  in  the  size  of  the  transient  recovery  load.  The  preallocation  algorithms  rarely  came  close  to 

providing  the  high  coverage  observed  for  the  MSM  algorithm.  Although  the  coverage  estimates  for  the 

PRA  remained  stable  as  the  size  of  the  transient  recovery  load  was  increased,  its  coverage  was  highly 

sensitive  to  the  periodic  load.  TTie  CRA,  on  the  other  hand,  was  less  sensitive  to  changes  in  the  periodic  ^ 

load  but  Its  performance  degraded  significantly  as  the  size  of  the  recovery  load  was  increased.  In  general, 

the  performance  of  these  preallocation  algorithms  was  competitive  with  that  of  the  .MSM  algorithm  only 

in  cases  in  which  the  joint  processing  load  wa.s  small.  Most  of  the  reported  coverage  values  represent 

steady-state  performance  estimates,  that  is,  estimates  of  the  coverage  provided  by  an  algorithm  when  the 

transient  recovery  load  is  observed  to  persist  for  an  infinite  period  of  time.  Since  transient  recovery  loads  ^ 

only  exist  for  short  periods  of  time,  the  sensitivity  of  coverage  to  finite  transient  durations  was 

investigated.  Results  showed  that  although  coverage  tends  to  increase  as  the  duration  of  the  transient 

decreases,  the  rate  of  change  is  very  small.  Hence,  steady-state  coverage  values  can  be  considered 

adequate  approximations  to  the  coverage  observed  for  finite-duration  transients,  albeit  slightly 

conservative. 

• 

The  research  presented  above  was  done  in  the  context  of  temporal  redundancy  where  the  hard  deadline 
aperiodic  tasks  arise  when  hard  deadline  pjeriodic  tasks  fail  their  acceptance  test.  This  creates  a  context  in 
which  the  deadlines  are  relatively  short  The  methodology  is  apjplicable  in  situations  with  longer 
aperiodic  deadlines.  We  arc  currently  comparing  the  performance  of  fixed  priority  and  dynamic  priority 
slack  stealing  algorithms.  • 
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3  Transitions  and  DoD  Interactions 

ART  project  personnel  frequently  interact  with  DoD  representatives,  especially  Lui  Sha  in  his  dual  role  ^ 

as  a  member  of  the  ART  project  and  of  the  CMU  SEl.  Dr.  Sha  is  deeply  involved  with  transitioning  rate 
monotonic  scheduling  theory  to  industry  and  government.  His  efforts  include: 

•  Member,  .NASA  Space  Station  Advisory  Committee, 

•  Interaction  with  the  Navy  NGCR. 

•  Named  Chairman  of  the  Board  of  Visitors  of  RICIS.  an  R&D  center  established  by  NASA  • 

and  NASA  JSC  at  University  of  Houston  at  Cleariake 

•  Coordinated  the  real-time  version  of  POSIX, 

•  V'orked  with  IEEE  802.6  standards  group  to  develop  a  real-time  capability, 

In  addition.  Hide  Tokuda,  as  developer  of  ARTS  (and  Real-Time  Mach),  interacts  regularly  with  * 

.NOSC.  IB.M  and  University  of  Virginia  to  coordinate  the  development  of  testbeds  at  all  four  sites  and 
experimenution  with  ARTS. 

As  a  part  of  our  software  fault  tolerance  effort  supported  by  N00014-92-J-1524,  we  have  interacted 
with  MITRE  Corporation  to  investigate  the  use  of  analytic  redundancy  for  airborne  radar  tracking  • 

systems. 

Finally,  the  rate  monotonic  scheduling  theory  is  increasingly  being  adopted  by  major  projects.  These 
projects  include: 

•  Navy  BSY-1  and  BSY-2,  ^ 

•  NASA  Space  Station  Freedom  (for  system  integration), 

•  European  Space  Sution  (recommended  its  use  for  its  Hard  Real-Time  OS  project). 

Jay  Strosnider  interacts  frequently  with  NRaD  San  Diego  Disuibuted  Combat  Control  project 
transitioning  technology  into  Navy  lab  testbeds  in  San  Diego.  He  also  interacts  with  IBM,  Bellcore  and  • 

Intel  on  commercial  applications  of  the  developed  technologies. 
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4  Software  and  Hardware  Prototypes 

A  variety  of  hardware  and  software  protoiypes  are  being  developed  as  a  pan  of  the  projea  and  have 
been  extensively  reported  including  the  ARTS  and  RT-Mach  operating  systems.  The  newest  hardware 
prototypes  involve  experimental  testbeds  to  test  analytic  redundancy  as  a  method  of  achieving  software 
fault  tolerance.  These  experiments  are  reported  in  the  annual  report  for  ONR  Contract  N00014-92- 
J-1524. 
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